Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor PIDFile.join() #8

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Conversation

webknjaz
Copy link
Member

This may be partially fixing #7.

@webknjaz webknjaz requested a review from jaraco March 11, 2020 20:24
@jaraco jaraco closed this Mar 11, 2020
@jaraco jaraco reopened this Mar 11, 2020
@jaraco
Copy link
Member

jaraco commented Mar 12, 2020

I'm a little unsure about this change. First thing is that join() used to be simple and very similar to wait(). After this change, it's much more complicated. Second is that join() used to have a simple function - wait until the file is gone. That fact that sometimes the file is gone and reappears during a check isn't necessarily the use-case that function was intended to serve.

But more importantly, I don't understand why the existing code is failing. When I run the tests on master with -k SIGHUP, it hangs on the very first test. The very first test is seeking to assert that the process exits on SIGHUP when not daemonized. But if there's still a pidfile after sending the signal, then the expected behavior isn't present. There's no race condition.

When I break into the process, I can see the test_signals.log looks like this:

[b'2020-03-11T20:16:27.584967'] (Bus 167dc5b8) Bus state: ENTER
[b'2020-03-11T20:16:27.585242'] (Bus 167dc5b8) PID 6409 written to '/Users/jaraco/code/public/cherrypy/magicbus/magicbus/test/test_signals.py.pid'.
[b'2020-03-11T20:16:27.585297'] (Bus 167dc5b8) Listening for SIGTERM.
[b'2020-03-11T20:16:27.585387'] (Bus 167dc5b8) Listening for SIGHUP.
[b'2020-03-11T20:16:27.585415'] (Bus 167dc5b8) Listening for SIGUSR1.
[b'2020-03-11T20:16:27.585435'] (Bus 167dc5b8) Bus state: IDLE
[b'2020-03-11T20:16:27.585453'] (Bus 167dc5b8) Bus state: START
[b'2020-03-11T20:16:27.585468'] (Bus 167dc5b8) Bus state: RUN
[b'2020-03-11T20:16:27.616095'] (Bus 167dc5b8) Caught signal SIGHUP.
[b'2020-03-11T20:16:27.616157'] (Bus 167dc5b8) SIGHUP caught while daemonized. Restarting.
[b'2020-03-11T20:16:27.616209'] (Bus 167dc5b8) Bus state: STOP
[b'2020-03-11T20:16:27.616234'] (Bus 167dc5b8) Bus state: IDLE
[b'2020-03-11T20:16:27.616254'] (Bus 167dc5b8) Bus state: EXIT
[b'2020-03-11T20:16:27.616398'] (Bus 167dc5b8) PID file removed: '/Users/jaraco/code/public/cherrypy/magicbus/magicbus/test/test_signals.py.pid'.
[b'2020-03-11T20:16:27.616423'] (Bus 167dc5b8) Waiting for child threads to terminate...
[b'2020-03-11T20:16:27.616457'] (Bus 167dc5b8) Bus state: EXITED
[b'2020-03-11T20:16:27.616480'] (Bus 167dc5b8) Re-spawning /Users/jaraco/code/public/cherrypy/magicbus/magicbus/test/test_signals.py tty
[b'2020-03-11T20:16:27.708287'] (Bus 5767f28d) Bus state: ENTER
[b'2020-03-11T20:16:27.708557'] (Bus 5767f28d) PID 6409 written to '/Users/jaraco/code/public/cherrypy/magicbus/magicbus/test/test_signals.py.pid'.
[b'2020-03-11T20:16:27.708601'] (Bus 5767f28d) Listening for SIGTERM.
[b'2020-03-11T20:16:27.708700'] (Bus 5767f28d) Listening for SIGHUP.
[b'2020-03-11T20:16:27.708730'] (Bus 5767f28d) Listening for SIGUSR1.
[b'2020-03-11T20:16:27.708753'] (Bus 5767f28d) Bus state: IDLE
[b'2020-03-11T20:16:27.708774'] (Bus 5767f28d) Bus state: START
[b'2020-03-11T20:16:27.708792'] (Bus 5767f28d) Bus state: RUN

It seems that at least on macOS, the asserted behavior just isn't happening. Sending SIGHUP is causing the process to restart and not shut down as expected.

@webknjaz
Copy link
Member Author

It seems that at least on macOS, the asserted behavior just isn't happening. Sending SIGHUP is causing the process to restart and not shut down as expected.

That's correct. It shouldn't shut down. SIGTERM should shut down the process, but SIGHUP should cause reload/restart and that's what it does. When the process receives SIGHUP, it first goes to a stopped state (with a PID file removed) and then it starts again (creating a PID file in the same location).

The very first test is seeking to assert that the process exits on SIGHUP when not daemonized. But if there's still a pidfile after sending the signal, then the expected behavior isn't present. There's no race condition.

No, this test tries to catch a moment "in-between". When reload reached stop state but start hasn't created a PID file yet.
I confirmed that there's a race condition by using watch -n.1 'stat /path/to/file.pid' in a separate terminal while running the test. At the beginning, there's one file but when SIGHUP is sent it disappears for a bit (almost unnoticeable) and gets recreated with a new created timestamp.

That fact that sometimes the file is gone and reappears during a check isn't necessarily the use-case that function was intended to serve.

Right, I didn't think that it might be the original intent.

But more importantly, I don't understand why the existing code is failing. When I run the tests on master with -k SIGHUP, it hangs on the very first test.

Fun fact: if you add -s to pytest, it doesn't hang. Maybe it's a race condition between tests and subprocess.

@webknjaz webknjaz mentioned this pull request Feb 11, 2021
@webknjaz webknjaz force-pushed the bugfixes/7-pidfile-join-too-fast branch 2 times, most recently from 4846960 to 60a41bb Compare February 11, 2021 21:00
@webknjaz webknjaz force-pushed the bugfixes/7-pidfile-join-too-fast branch 4 times, most recently from 5875b24 to bdb836c Compare March 13, 2021 22:32
@jaraco jaraco changed the base branch from master to main March 15, 2021 23:18
jaraco added a commit that referenced this pull request Mar 16, 2021
Co-authored-by: Jason R. Coombs <[email protected]>
@webknjaz webknjaz force-pushed the bugfixes/7-pidfile-join-too-fast branch from bdb836c to 698ba05 Compare March 21, 2021 01:47
This patch works around an issue described in
#7 by also comparing the stat of
PID files. Otherwise it misses cases when PID file
gets recreated faster that the check loop hits
another existence check.
@webknjaz webknjaz force-pushed the bugfixes/7-pidfile-join-too-fast branch from 698ba05 to 905b42a Compare March 21, 2021 02:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants